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PROGRA>WING 


COMPUTATIONAL  PROBLEMS  11^  THE  THEORY  OF  DYNArCC 

By 

Richard  Bellman 


.  Introduction. 

In  recent  years,  as  multi— stage  processes  have  come  to 
assume  a  role  of  greater  and  greater  Importance  In  the  Industrial 
and  economic  arena,  a  number  of  Interesting  and  novel  mathematical 
problems  have  arisen,  many  of  formidable  caliber.  The  theory  of 
dynamic  programming  was  created  to  furnish  an  approach  to  these 
problems.  The  essential  aim  of  the  theory  Is  to  translate  these 

questions  from  the  unfamiliar  field  of  policies,  strategies,  pro— 

% 

grammlng  and  scheduling,  and  such  seeming  Imponderables,  Into 
functional  equations  which  can  be  attacked  by  the  precise  techniques 
of  analysis.  These  equations  are,  however,  nonlinear  In  general, 
and  possess  the  usual  feature  of  problems  which  occur  In  applica¬ 
tions,  namely  resolute  and  Impartial  Insolubility. 

Since  a  theory  that  has  pretensions  of  application  stands  or 
falls  upon  Its  ability  to  produce  numbers,  It  Is  of  paramount  Impor¬ 
tance  to  derive  approximate  techniques  which  may  be  used  to  deter¬ 
mine  numerical  solutlor.s. 

In  the  following  pages  we  shall  consider  a  simple  problem 
Involving  a  sequence  of  decisions,  first  formulating  it  In  classical 
form  and  then  In  terms  of  the  dynamic  programming  approach.  We 
shall  then  use  this  resultant  functional  equation  to  Illustrate  a 
number  of  approximation  techniques,  employing  the  F>artlcu larly 
Important  concept  of  approximation  In  strategy  space. 
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In  closing  we  shall  mention  briefly  some  problems  of  more 
complicated  form  to  which  the  same  techr.lques  are  applicable. 

^ .  Optimal  Allocation. 

As  a  simple  exapple  of  a  large  class  of  problems  that  occur 
In  appllcat  lore ,  let  us  cor. alder  the  following.  We  are  given  a 
quantity  x  >  0  that  may  be  divided  Into  two  parts  y  and  x— y.  From 
y  we  obtain  a  return  of  g(y),  and  froir.  (x— y)  a  return  of  h(x-y). 

In  so  doing  we  exper.d  a  certain  amouni  cf  oui-  original  resources 
and  are  left  with  a  new  quant!  tv  ay  -f  b(x-y),  where  a  and  b  are 
positive  constants  leas  than  one,  wltn  which  to  continue  the  pro¬ 
cess.  How  does  one  proceed  to  maximize  the  total  return  obtalred 
over  V  stages? 

The  conventional  approach  to  Uils  problem  begins  by  listing 
the  allocations  yj  ,  ys ,  •••,  at  the  first,  second,  •••  and 
stages.  The  total  return  from  this  sequence  of  choices  will  be 

.N  N 

(1 )  J(yi .  ys.  •  •  • .  yv, )  -  ^  ^  g(y 

where  the  variables  are  constrained  by  the  conditions 

^ 

Xi  •  X 

x?  -  ayi  *»■  b(x,-yi  ) 

1  ^  1  ~  1  ^  * 


). 


(2) 
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The  problem  Is  now  to  maximize  J  subject  to  the  above  restric¬ 
tions.  Since  several  of  the  optimal  may  be  boundary  points, 
and  In  some  cases  all  are  boundary  points,  an  unrestricted  use  of 
calculus  Is  not  possible. 

We  are  now  confronted  with  a  problem  possessing  the  typical 
nasty  features  of  maximization  problems  over  N-dlmenslonal  regions. 
Furthermore,  we  obsem/e  that  solving  the  problem  In  Its  present 
form  yields  too  much  Information  In  the  sense  that  we  determine 
yi  *  ya »  **’»  and  y^^  simultaneously,  whereas  all  that  Is  actually 
required  to  carry  out  the  process  Is  y^  as  a  function  of  x  and  the 
number  of  stages  reojalnlng. 

In  the  next  section  we  shall  formulate  the  problem  from  tbiat 
point  of  view. 

§3.  Functional  Equatlor.  Approach. 

Let  us  define 

(3)  f.,(x)  -  total  return  obtained  from  N-stages  using  an 

optimal  policy 

It  Is  clear  that  the  maximum  over— all  return  Is  a  function  only  of 
the  Initial  ari.ount  x  and  the  number  of  stages  remaining. 

If  the  Initial  allocation  is  y,  the  total  return  will  be 
l(y)  h(x-y)  the  return  from  the  succeeding  (N— l)  stages.  Since 
It  Is  easily  seen  that  an  optimal  policy  must  have  the  property 
that  its  continuation  after  the  first  stage  must  be  optimal  with 
respect  to  the  new  Initial  amount  ay  +  b(x— y)  and  the  remaining 
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(N— 1)  stAges,  we  obtain  as  the  total  return  due  to  an  Initial 
choice  of  y 

(<)  **■  ^(^-y))- 

Since  we  wish  to  maximize  the  total  return,  y  is  now  chosen 
to  maximize  this,  yielding  the  functional  equation 

(5)  fjx)  -  Max  R..(y) 

-  rg(y)  +  h(x-y)  +  f,  ,  (ay  -t  b(x-y);  |, 

L-  ^ 

for  N  2,  with 

(6)  fi(x)  -  Max 

0<y<x 

We  shall  assume  henceforth  that  g  and  h  are  continuous  fu.nc- 
tlons  in  the  interval  [p,xj  ,  so  that  the  maxima  are  all  assumed. 

We  have  thus  replacf'd  the  original  problem,  as  described 
in  (1)  and  (2)  of  §2,  by  the  sequence  of  recurrence  relations  in 
(5)  and  (6)  above.  Althougli  these  recurrence  relations  are  non¬ 
linear,  the  reglot.  of  variation  Is  one-d Imens lonal .  To  Justify  this 
transf ormatlor.  of  the  original  problem,  we  must  show  that  these 
equations  above  can  be  utilized  to  yield  both  theoretical  and 
numerical  results. 


« ( y )  +  h  ( x-y ) 
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§4.  A  Preliminary  Approximation. 

Let  us  begin  by  making  a  preliminary  approximation  that  N 
is  Infinite,  In  place  of  the  system  of  recurrence  relatione  of 
(5)  and  (6)  of  §3  we  obtain  one  functional  equation 

(7)  f(x)  -  Max  j  g(yj  >  h(x-y)  +  f(ay  b(x-y))l 

where 


(•) 


f(x) 


This  Is  Justified  by  the  following  result: 
Theorem  1 .  Consider  equation  ( 1 )  and  assume  that 


(9) 


(a)  g(0)  -  h(0)  -  0, 

(b:  0  <  a,  b  <  1 

( c  )  g ( X )  and  h ( X ;  are  continuous  and  monotone  Increasing 

oo  oo 

(d,  g(c^x)  <  oo,  h(c“Xy  <  00, 


where  c  ■  Max(a,b). 


Under  these  coridltlons  there  Is  a  unique  solution  which  Is 
c  :>ntlr.uous  In  and  possesses  the  valae  0  at  0. 
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If 


fi  •  Max  r  g(y)  +  h(x-y)  1 
L  J 


(Id  0<y<x 


f  -  Max 

^  0<y<x 


l^g(y)  h(x-y )  -►  ^ » 


we  have 


(U)  f(x)  -  llm  f  (x). 

n — >  oo 

For  the  proof  of  this  and  the  five  results  stated  below,  we  refer 
Lto  [2j  snd  [6J  • 

55*  Approxliaation  Techniques— Successive  Approxlnatlons-l . 

Let  us  write  our  functional  equation  In  the  form 

(a)  f  -  T(r,p), 

»rtiere  f  represents  the  unknown  function,  T  represents  the  trans- 

forraatlon  Max  I  g(y)  ■¥  h(x-y)  -f  f(ay  +  b{x-y))  ,  and  P 

0O<*  ^ 

represents  the  set  of  known  parameters,  the  functions  g(x)  and 
h(x)  and  the  constants  a  and  b,  that  appear  In  T. 

In  theory  there  Is  only  one  method  to  be  used  In  approximating 
the  solution  of  a  functional  equation,  namely  the  technique  of 
solving  an  approximate  functional  equation.  It  Is  In  the  choice 
of  these  approximate  equations  that  practice  varies. 
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The  method  of  successive  approximations  in  its  usual  guise 
relies  upon  solving  the  following  system  of  equations: 

(O)  -  T(fn'P)' 

Where  fi  is  a  guess  at  the  solution.  In  more  refined  applications, 
(O)  1b  replaced  by 

<*)  '■n.l  -  ■  ■''■n'*’)  " 

where  R  Is  a  transformation  so  chosen  as  to  force  f^  to  possess 
certain  desired  properties  or  to  Increase  the  rapidity  of  conver¬ 
gence. 

A  simple  way  to  proceed  Is  to  mimic  the  physical  process  and 

take 

(15) 

and 

^n+1  “  I  h(x-y)  b(x-y))  1. 

O^y^x  L  J 

The  computations  are  quite  easy  to  perform  and  possess  tne 
merit  of  furnlEhlng  useful  Information  at  the  same  time.  However, 

It  Is  not  to  be  expected  that  the  convergence  will  be  very  rapid 
Initially.  Consequeritly ,  we  shall  Investigate  some  other  procedures. 
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§6 .  Approximatior.  Techniques — Simplified 
In  place  of  the  above  approximatior  , 
replacing  the  equation  f  ■  T(f,P)  by 


Equat Ion . 

we  may  approximate  by 


(17)  f  -  T(r,P’ ) 

where  P'  represents  a  different  set  of  parameters,  one  which  per¬ 
mits  a  solution  In  toto,  or  whl  'h  yields  a  stronger  hold  or.  the 
solution. 

Thus,  for  example,  Ir.  our  equation 

(iB)  f(x)  -  Max  r  g(y)  +  h(x-y)  f(ay  +  bfx-y;/  I, 

0O<x  ^ 

we  make  the  further  assumption  that  g  and  h  are  convex  functlo.s. 
The  followjr.g  result  then  holds: 

Theorem  2.  ^  g  and  h  are  convex  fimctlotis  and  the  condltlorgof 

Theorem  1  hold,  an  optimal  policy  consists  of  choosing  y  -  0  or  x. 

Although  this  simplifies  the  flndlr.g  of  the  solution.  It  Is 
still  not  easy  to  find  an  explicit  solution. 

If  we  wish  to  obtain  an  explicit  approximate  solution,  we  can 
make  the  further  approximation  that 

(19)  g  -  h  -  cx\ 

This  corresponds  to  the  approximation  of  log  g(x)  by  log  x 

or  log  g(e*)'i:7y(Vi  4  ^,x. 


P-423 

_u_ 

✓ 

In  the  case  where  g  and  h  have  the  simple  forms  given  in 
(19)  above,  we  have  the  following  result: 

Theorem  3.  The  solution  of 

f(x)  ■  Max  cx^  +  f(ax),  ex^  f(bx)  J 

subject  to 


(21) 

(a) 

0  <  a,  b  <  1, 

c  ,d  >  0 

(b) 

0  <  d  <  f 

is  give: 

_by 

f(xj 

“=  cx^  4  f(ax) 

.  -  <  "  <  *0 

»  ex^  4  f(bXy 

.  X,  <  X 

wne  re 

(23) 

X  «= 

.  0 

r 

'  1  -  a' 

1 

ix(r-d} 

In  the  general  case  wh^re  g  ar.d  h  are  convex  and  we  kr.ow  that 
y  »  '  or  X  at  each  stag'^’,  partial  results  slcillar  to  the  above  can 
be  fo'rd.  It  would  be  Interesting  to  know  ur.der  what  further 


assuT.ntlon  In  adiltlon  lo  cor.vexltv  one  nas  a  solutl  m  similar  to 
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If  g  and  h  a  TP  not  convf'X, 
Ing  result  to  obtalr  a  solutlor.: 
Theorem  Let 

(24)  (a)  g(0)  -  h(C)  - 

(b)  g*(x).  h'(x)  ^  C 

(c)  g"(x),  h"(x)  <  0 


but  concave*, 


for  x  ^  0, 

for  X  >  0 , 


we  rr^y  use  the  follow— 


and  consider  the  sequence  of  approximations  to  f  defined  by 


f  (x'  -  Max  g(y;  +  h{x-y )  | , 

(25)  ^ 

f  (x)  -  Max  I  g(y)  +  h(x-,',  -t-  f [ay  -t-  r  ( x  -y  J]  |  , 
C  <x  - 

f  *  1  2  *  •  • 

•  ‘  '  0  ^  P  ^  P 


For  each  n,  th^r^  Is  a  unique  that  yields  th»  naxlTum. 

If  n  <  a,  ^  nave  yi  ^72  ^  ya'  ’  ’  ,  a-.d  the  reverse  Inequalities 
for  b  ^  a.  Ir.  particular,  If  yj-j(x)  -  x  for  some  n  In  the  cas" 
b  ^  a,  then  yj„(x)  ■  x  for  m  ^  and  the  aolutlor.  of  the  original 
equation  in  (lo)  will  be  furrilshed  by  y  ■  x. 

Let  us  note  finally  that  If  ar,  Irterlor  maximum  exists  we 
must  have  simultaneously 


(26) 


g'(y)  -  h'(x-y.  >  (a-»  1  '  (a^,  ♦  i(x-v  i  - 
f '  (  X  )  -  1  '  (  x-y  )  +  ;  f  '  (  8 y  +  n  (  x -y  )  . 


'hege  equatl 


r.ny 


e  SO 


■  X  p  1 1  c  1 1 1  y 


ar. 


i  i 


ar.i  h  are  quadratic. 
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^7.  Approximation  Techr.lque — Approximatior,  Ir.  Strategy  Space. 

Up  tc  now  w^  h8'.'*^  dlecuaslng  c O’ veri t lor.a  1  approximation 

tf'chr.  1  qu^  3 ,  comr.on  to  th^  functional  equatior.s  tnat  arise  In  mathe¬ 
matical  physics.  L'^t  us  now  discuss  a  technique  that  Is  particu¬ 
larly  suited  t:  dyamlc  png  ramming. 

In  followl.ng  tne  above  approacf;  wn  established  an  equivalence 
letweer.  the  space  of  ail  allowable  allocations,  a  strategy  space, 
ar.d  the  fu'Ctlon  space  :f  all  ^or.celvable  solutions  of  our  func¬ 
tional  rquatlor..  A'  optimal  policy  yields,  by  direct  computation, 
trie  s^'lutlo:.  of  the  frictional  equation,  a  d  conversely,  the  func¬ 
tional  equatlo:.,  tfirough  Its  de termlnat  1  or.  of  y(*),  yields  an 
'ptlmal  sequence  of  allocations. 

it  follows  theu  that  we  have  a  duality  between  the  strategy 
space  and  the  function  space  with  the  prerogative  of  attacking  the 
problem  on  the  grounds  of  our  own  choosing. 

This  Immediately  furnishes  us  with  a  irew,  powcxrful  technique 
for  finding  a{  proximate  solutlon.s.  In  plac‘d  of  approximating  In 
functlo-  spac®  we  may  appr-xlmate  In  strategy  space.  it  Is  In 
t.hls  way  that  we  may  most  efficiently  exploit  the  Insight  and  Intui¬ 
tion  galr.ei  f-'o^  I"  xper'len^e  . 

For  example,  we  mlgb.t  argue  that  trie  unit  cost  Is  the  deter¬ 
mining  factor  and  set  y  ■  ^  whenever 


2f 


ri-F^r-  >  ’ 


a  n  1 


x  othe rwl se . 
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Uslng  this  commute  a  function  rj(x'.  This  Is  r.cw 

used  as  a  firs*  approxlmatlc^r  . 

The  great  advantage  of  this  techr  Ique  lies  Ir.  the  fact  that 
It  ensures  monotone  convergence^.  We  knoiK  automatically  that  the 
next  approximation  will  yield  a  superior  policy.  1  o  iemor.strate 
this,  let  fi(x}  be  generated  by  a  rule  which  fur-  Ishes  y  given  x 
Then 


(36)  f*i  (x )  -  g(y )  -»■  h(x-y)  +  ft  (ay  -*•  b(x-y):. 

It  follows  that  If  Is  determined  ly 

(29)  f*2(x)  -  Vftx  I  g(y)  >  n(x-y)  ft  (ay  4  b(x-y;,|, 

we  have  f^  ^  f,  with  equality  or.ly  If  ft  Is  the  actual  solutlO'  . 

Having  established  that  fa  ^  ft  It  Is  lmTiella*e  that  Tj  as  d“t^r- 

ralned  y 

—  — ^ 

ixi  ^^(x'  =  Vax  I  ,i;(y]  hfx-y)  4  f2(ay  4  b(x-y;)| 

<y<x  -  ^ 

Is  greater  thar  '■‘cual  i<  f?,  an  1  Inductively  that  f',  ^  f. 

f nr  n-1  ,2 ,  •  ■  ■  . 

The  whole  point  of  solving  a  simple  model  of  a  decision  problem 

lb  not  so  much  that  It  fij^nlshes  ar.  approximate  fund  lot  ,  but 
ra^ner  tl:at  1*  furnishes  an  approximate  policy,  which  Is  row  use  1 
tn  furnish  an  apfr'hxlmate  rur.ctlon  fra  mor'^'  c orr.p  1 1  cal  e i  and 
realistic  problrc. . 
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§0 .  Some  Generalizat ior.8  . 

Let  US  consider  several  Immediate  generalizations.  We  may 
first  of  all  consider  the  case  where  the  return  and  the  cost  are 
both  functloriS  of  the  stage.  The  resultant  fvmctlor.al  equatloriS 
then  have  the  forr. 


(31 


Max  I  a.  (x 

0^<X  L 


b  ( X 
k ' 


A  more  Interesting  ge-^eral  Izatl  or>  Is  that  wh'^re  the  return 
Is  not  determined,  but  subject  to  a  probability  distribution. 

Thus,  as  an  Illustration,  let  us  assume  that  If  tne  Initial 
allocation  la  y  there  is  a  probability  pi  that  the  return  Is  gi(y) 

+  hi(x— y)  a* d  that  the  quantity  remaining  Is  ajy  +  h,(x-y),  and  a 
probability  pj  =  1  -  pi  tnat  the  return  Is  ( y )  +  ha(x— y)  l^r'id 
the  quantity  left  Is  a2y  b2(x-y). 

Since  th*  return  Is  now  a  stochastic  quantity,  It  Is  r.o  longer 
possible  to  speak  of  maximizing  the  return,  tut  rather  to  speak  of 
maxlmlzlr.g  tbe  averag*^  value  of  some  fur.ctlon  of  tnls  return.  The 
simplest  m^^asur^  Is  tn^  expected  return.  Let 


52  f(x,  -  expected  total  laturr.  obtained  using  an  optimal 

policy . 


as  atove,  we  obtain  tne  functional  equation 


(32 


f(x)  -  ‘-'ax  I  Pi  <|gi(y)  +  h,(x-y;  >  f(aiy  +  tJi(x-y;) 

<y<x  ^ 


-*■  f2 


+  h^Cx-y)  f(a2y  +  b2(x-y))J 
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Reeults  analogcxis  to  those  dleacrlbed  Ir.  tne  preceding  sec— 
tlons  hold  for  this  and  the  still  more  gerieral  form 

(34)  f(x)  -  vax  I  1  La(x,y,z)  f(b(x,y,z))3  d:i(z,y;  !, 

^  o 

where  the  distribution  of  outcomes  depenis  upon  the  outcome. 

Functional  equations  ^f  similar  type  occur  In  the  work  or. 
the  optimal  Inventory  problem  of  Arrow,  Karris,  and  Marscltak  , 
and  Dvoretzky,  Kiefer,  and  Wolfowltz  . 

§9-  A  Particular  Example. 

In  the  previous  sections  we  have  discussed  approximation 

. .  t 

techniques  which  are  particularly  applicable  when  t:.  arid  h  are 
either  concave  or  convex.  Since,  In  appl 1  cat  1  o- s ,  curves  wl tn 
points  of  inflection  are  of  frequent  occurrence,  it  is  of  oom.e 
Interest  to  see  wl^at  occurs  wher.  g  and  h  r<nve  r. either  of  tbie  simple 

forms . 

A  particularly  simple  pair  of  easily  'omputa;  1*'  functions 
possessing  points  of  inflection  are 

(3^)  g(y)  -  ,  h(y  -  . 

The  equation  ^w  has  the  form 

(56)  r(x)  -  'lax  I  +  „^/(x-v)  ^  ^  _ 

O^^x  -  J 


with 
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(37)  fj(x)  -  Max  >’) 

L 

fsix)  -  Max  r  -f  ^  +  fi  (ay  ■♦■  b(x— y)) 

O^^x  L 

Theee  fxjnctlona  were  computed  for  various  sets  of  values;  the 
curves  for  a  -  .8,  b  -  .h,  r  »  1,  c  «  10,  d  •  15  are  appended. 

ViTjat  Is  striking  Is  that  although  the  gi*aphs  of  r{x),  fi(x),  r2(x) 
&rf^  quite  smooth,  the  graph  of  y  «  y(x),  the  maxlmlzlr.g  choice  of 
y.  Is  quite  disjointed.  It  Is  probably  true,  although  we  have  not 
verified  It,  that  there  exists  ar  approximate  strategy  which  Is 
slowly  varying  In  x  and  yields  almost  as  large  a  total  return  as 
the  exact  strategy.  This  property  Is  typical  of  many  problems  of 
this  type. 
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